Protecting Respondents' Identities in Microdata Release
نویسنده
چکیده
Today’s globally networked society places great demand on the dissemination and sharing of information. While in the past released information was mostly in tabular and statistical form, many situations call today for the release of specific data (microdata). In order to protect the anonymity of the entities (called respondents) to which information refers, data holders often remove or encrypt explicit identifiers such as names, addresses, and phone numbers. De-identifying data, however, provides no guarantee of anonymity. Released information often contains other data, such as race, birth date, sex, and ZIP code, that can be linked to publicly available information to re-identify respondents and inferring information that was not intended for disclosure. In this paper we address the problem of releasing microdata while safeguarding the anonymity of the respondents to which the data refer. The approach is based on the definition of k-anonymity . A table provides k-anonymity if attempts to link explicitly identifying information to its content map the information to at least k entities. We illustrate how k-anonymity can be provided without compromising the integrity (or truthfulness) of the information released by using generalization and suppression techniques. We introduce the concept of minimal generalization that captures the property of the release process not to distort the data more than needed to achieve k-anonymity, and present an algorithm for the computation of such a generalization. We also discuss possible preference policies to choose among different minimal generalizations.
منابع مشابه
Microaggregation for Protecting Individual Data Privacy
Microaggregation is a technique for protecting the privacy of respondents in individual data (microdata) releases. This papers starts with a survey of the general definitions and concepts related to microdata protection and then reviews the state of the art of microaggregation, to which our group has substantially contributed.
متن کاملData Confidentiality
When releasing data to the public, data disseminators typically are required to protect the confidentiality of survey respondents’ identities and attribute values. Removing direct identifiers such as names and addresses generally is not sufficient to eliminate disclosure risks, so that data must be altered before release to limit the risks of unintended disclosures. When intense data alteration...
متن کاملWhat roles do remote servers and synthetic data play in the future of data dissemination ? New Approaches to Data Dissemination : A Glimpse into the Future ( ? ) Jerome
Many national statistical agencies and survey organizations disseminate microdata, i.e., data on individual units in public use data files. These data disseminators strive to release files that are safe from attacks by illintentioned data users seeking to learn respondents’ identities or sensitive attributes, informative for a wide range of statistical analyses, and easy for users to analyze wi...
متن کاملA CRONYM : Data without Boundaries D
Disclosure limitation methods for protecting the confidentiality ofrespondents in survey microdata often use perturbative techniques whichintroduce measurement error into the categorical identifying variables. Inaddition, the data itself will often have measurement errors commonly arisingfrom survey processes. There is a need for valid and practical ways to assess theprotect...
متن کاملThe Challenges of Preparing Sensitive Data for Public Release
Survey piactitioners arechallenged to meet he ever rising demand for microdata fdes while protecting the confidentiality of the individual data provider. Data pmcessing tools are becoming quite sophisticated which is an aid to survey practitioners, but also a potential tool to data snoops. Additionally, there is a percqtioo that participation in surveys is declining world wide; partially due to...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- IEEE Trans. Knowl. Data Eng.
دوره 13 شماره
صفحات -
تاریخ انتشار 2001